Virtual array method for 3d robotic vision

ABSTRACT

A pulsed light source illuminates a scene with a virtual array of points. Light reflected by the scene is detected by a small pixel array, allowing generation of a three-dimensional map of the scene. A processing element processing data output by the small pixel array uses a multipath resolution algorithm to resolve individual objects in the scene.

RELATED APPLICATION

This application is a division of U.S. patent application Ser. No. 16/287,411, filed Feb. 27, 2019, which claims priority to U.S. Provisional Application No. 62/786,906, filed Dec. 31, 2018, both of which are hereby incorporated by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates to the field of robotic vision, and in particular to low power high resolution three-dimensional (3D) robotics vision techniques.

BACKGROUND ART

The field of robotic vision has expanded tremendously recently. In both home and industrial markets, robotic vision is helping improve efficiency, safety and mobility. From home devices such as robotic vacuum cleaners to industrial assembly line robots, there is a need for 3D robotic vision. Autonomous vehicles such as drones and self-driving automobiles also have a great need for 3D robotic vision. While optical 3D sensing has the potential to provide the highest resolution compared with other sensing modalities such as ultrasound and millimeter wave technology, current 3D optical sensor devices rely on large sensor arrays and consume significant amounts of power, but produce results with limited resolution.

SUMMARY

In one example, a system for three-dimensional robotic vision includes a light source and a transmit optical element. The transmit optical element is configured to project a virtual array pattern of laser points that illuminates a scene. The system also includes a receive imaging lens configured to receive reflected light from the scene and project the light onto a pixel array. The system further includes a processing element coupled to the pixel array and configured to generate a three-dimensional map of the scene from information generated by the pixel array responsive to receiving the reflected light.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an implementation of apparatus and methods consistent with the present disclosure and, together with the detailed description, serve to explain advantages and principles consistent with the disclosure. In the drawings,

FIG. 1 is a block diagram illustrating a robotic vision technique employing uniform spatial modulation of light also known as flood illumination according to the prior art.

FIG. 2 is a block diagram illustrating a robotic vision technique employing discrete spatial modulation of light according to one embodiment.

FIG. 3 is an example of a light source, diffractive optical elements, and spatial modulation patterns according to one embodiment.

FIG. 4 is a pair of graphs illustrating differences between low intensity, low frequency time modulated light and high intensity, high frequency time modulated light for robotic vision according to one embodiment.

FIG. 5 is a graph illustrating resolving two sources according to one embodiment.

FIG. 6 is graph illustrating the inability to resolve two sources according to another embodiment.

FIG. 7 is a block diagram illustrating a system for 3D robotic vision according to one embodiment.

DETAILED DESCRIPTION

In this description: (a) references to numbers without subscripts are understood to reference all instance of subscripts corresponding to the referenced number; and (b) reference to “one embodiment” or to “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiments is included in at least one embodiment of the invention, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.

Although some of the following description is written in terms that relate to software or firmware, embodiments can implement the features and functionality described herein in software, firmware, or hardware as desired, including any combination of software, firmware, and hardware. References to daemons, drivers, engines, modules, or routines should not be considered as suggesting a limitation of the embodiment to any type of implementation.

FIG. 1 is a block diagram illustrating a conventional 3D robotic vision technique according to the prior art. Flood illumination using time modulated light is used to illuminate a scene 130, typically with an LED light source 110 and relevant optics 120. The modulation causes the light intensity to vary over time with a certain frequency, but the technique floods the entire scene with light for a predetermined amount of time. Light reflected by the scene is detected by large pixel array 140, typically an 80×60 array of 4800 pixels in order to obtain a usable resolution. Manufacturing of large pixel arrays is costly. In addition, each pixel must be processed, the data from the pixels must be converted into a usable format, and the result is a large amount of data computation, which requires significant power to construct the 3D image, by performing time of flight calculations to compute the distance between the pixel and the source of the light reflected back to the pixel.

This kind of flood illumination has an optical power loss proportional to

$\frac{1}{d^{2}}\left( {d = {distance}} \right)$

in both the transmit and receive paths. Low intensity light may be used for safety and power consumption reasons, but low intensity light is optically power inefficient compared to using high intensity light illumination techniques described below. In addition, using the large pixel array 140, which is required to obtain the required resolution, is electrically power inefficient. Each sensor pixel in the large pixel array 140 would need a dedicated analog front end (AFE). Digitizing the large pixel array 140 is computationally intensive, which means that the power utilization of this prior art technique is large. In addition, the amount of time required to process all 4800 pixels means that slow data access can cause image distortion and artifacts in the digital images produced from the pixel data.

Other approaches such as lidar systems use motorized or optical scanning of a scene that require complex mechanical apparatus, making them inappropriate for many types of 3D robotic vision applications.

FIG. 2 is a block diagram illustrating a system for 3D robotic vision according to one embodiment that uses discrete spatial modulation of light. In this example, a light source such as an array of four lasers 210 can transmit light through diffractive optical elements (DOEs) 220 to create the spatial modulation pattern.

FIG. 3 illustrates two elements that can be used in a pulsed system such as the system of FIG. 2 according to one embodiment. Chips 310 are examples of a four laser array, each of which can be configured to generate light independently at the same wavelength, typically in the infrared range. For example, a 4-channel 905 nm laser diode using InGaAs lasers provides a chip that allows four individually addressable lasers with no electronic or optical crosstalk, rise and fall times of under 1 ns, and optical power of 75 W. Table 1 below illustrates some of the reasons why a laser chip such as chip 310 may be preferable to a light-emitting diode (LED) light source:

TABLE 1 LED Laser Mode of operation Spontaneous emission Spontaneous emission of light (non-coherent) of light (coherent) Ability to focus Cannot be focused to Can be focused to a narrow beam a narrow beam Electro-optical Low ~10% High ~40% efficiency Modulation speed Low ~10 Mhz High ~GHz Pulsed operation Long pulses ~100 ns Short pulses <1 ns Peak power 4 W (low peak 120 W (high peak power) power) Average power 2 W (high average 120 mW (low average power) power) Forward voltage drop ~3 V ~10 V Line width (spectral 30 nm @ 850 nm 5 nm @ 905 nm width) Emitter size (for Large (1 mm) Small (100 μm) same W) Cost (M units) <$0.10 <$1

DOEs 320 are optical elements that diffract the light passing through them from a single coherent light source into a pattern of points. DOEs 320 are typically made of plastic or glass, and can be made inexpensively. DOEs 320 can be configured with any desired pattern, such as regular array pattern 330 or irregular pseudo-random pattern 340 illustrated in FIG. 3 . Thus, one laser can become a multitude of points in the virtual array that illuminates the scene. Typically, each laser in the laser array 310 is split by a DOE 320 to illuminate the scene with a plurality of points, with each laser being turned on and off at different times from each of the other lasers in the laser array 310, producing illuminated points in a virtual array at different times. In some embodiments, pulses of 100 MHz may be used from each of the lasers in the laser array 310. The pulse width is determined by the laser array 310, and a smaller pulse width, such as 1 ns, is preferable to keep energy requirements low.

Although described here in terms of laser light sources and DOEs, other light sources and other techniques for generating a virtual array of points can be used. For example, display devices based on optical micro-electro-mechanical technology using digital micro-mirrors, such as a DLP® imaging device from Texas Instruments, can be used instead of a DOE. Phased arrays can also be used for steering a light source to form a virtual array instead of a DOE.

Instead of flooding the scene, light from each of the four lasers can be diffracted into a collection of points, such as 4×4 array 225. Light reflected back from the scene from each of the 4×4 arrays of points 225 is captured by single pixel of a small pixel array 230, such as a 300 pixel 20×15 array in one embodiment. At any given moment in time in this embodiment, the pixel sees only the four points from one of the four lasers of the laser array 310, but with a computational technique described below, all four points can be resolved.

Although there is optical loss, by using a laser dot projection technique such as this, the optical power loss is proportional to

$\frac{1}{d^{2}}$

in only the receive path, so tnere is less optical power loss than in the technique illustrated in FIG. 1 . Furthermore, because the light is spatially modulated as discrete points in the field-of-view, the optical intensity is higher at these locations than the flood illumination approach of FIG. 1 . This approach results in lower optical power requirements for the discrete spatial modulation technique of FIG. 2 . In addition, the use of discrete spatial modulation allows the use of signal processing algorithms that can resolve multiple points using a single pixel. The reduced size of the small pixel array 230 simplifies the hardware of the pixel array considerably, allowing saving power at the interface for the small pixel array 230, as well as providing the capability for faster image processing, which in turn leads to a reduction in image distortion.

A further aspect relates to the time modulation of the light, which is explained in FIG. 4 . On the left is a graph 410 of light from a low intensity, low frequency modulated source, such as is used in the system of FIG. 1 , and on the right is a graph of light from a high intensity, high frequency modulated source 420, such as is used in the system of FIG. 2 . For detectability, the received light at the receiver should have a signal-to-noise ratio (SNR) of more than 10-15 dB, implying a false alarm rate of <10%. The transmitted optical energy, E, for each measurement can be computed using the following equation

E=N·S·T _(p)

where N is the number of pulses, S is the peak optical signal power in watts and T_(p) is the duration of one pulse (half power, full width). For diodes, the peak optical power is proportional to current. As a result, when the noise in the system is dominated by the analog circuitry (e.g. photodiode shot noise, AFE noise, etc.), the SNR∝S²·N·T_(p). The safety of a light source is related to the total number of photons per second transmitted by the light source (i.e. the amount of optical power than can be transmitted onto a given field-of-view). As a result, the amplitude of the signal S can be calculated as

$S = {\frac{E}{N} \cdot T_{p}}$

where E=the energy of the light transmitted. Finally, the precision of the detector P is proportional to the ratio of the pulse width to the square root of the SNR, or

$P \propto {\frac{T_{p}}{\sqrt{SNR}}.}$

As a result, one can perform system tradeoffs for the two illumination techniques.

As an example of flood illumination, a robotic vision system requires an amplitude S=5mA, a number of pulses N=20k, and a pulse width T_(p)=20ns, leading to a total time T_(f)=4ms. In contrast, the proposed discrete spatial modulation system as illustrated in FIG. 2 , while using a higher peak power light source, because the light source illuminates discrete points in the field-of-view instead of the entire field-of-view, can use an amplitude of S=8A, but only five pulses, with a pulse width T_(p)=2ns. This illumination approach results in a SNR that is 64 times (18 dB) higher than the flood illumination technique, an energy, E, usage that is 25 times less, and a precision, P, that is 80 times higher. Thus, the 3D robotic vision system of FIG. 2 can be more detectable, safer, and more precise than the flood illumination system of FIG. 1 .

A multipath resolution algorithm may be used to identify objects in the scene. One such algorithm is the Multi-Signal Classification (MUSIC) algorithm, but other multipath resolution algorithms may be used. The MUSIC technique involves eigenvalue decomposition on the sample covariance matrix:

$\hat{R} = {{\left\lbrack {V_{s}\ V_{n}} \right\rbrack\begin{bmatrix} \lambda_{S} & 0 \\ 0 & \lambda_{n} \end{bmatrix}}\left\lbrack {V_{s}\ V_{n}} \right\rbrack}^{H}$

Eigenvalue decomposition forms an orthonormal basis to separate the signal space as orthogonal to the noise space. Forming a one-dimensional searching (steering) vector:

a ⁡ ( d k ) = [ e j ⁢ ω 1 ⁢ d k c ,   ... , e j ⁢ ω L ⁢ d k c ]

The signal space, which is orthogonal to the noise subspace can be spanned, forming a MUSIC spectrum:

${P\left( d_{k} \right)} = \frac{1}{{❘{{a\left( d_{k} \right)}V_{n}}❘}^{2}}$

Peaks in the MUSIC spectrum correspond to signal sources. The resolution depends upon the pulse rate of the signal source. For example, in FIG. 5 , which is a graph 500 of distance versus normalized spectrum, peaks 510 and 520 can be resolved at 2 m and 2.3 m pulsing the laser light at 50 MHz. In FIG. 6 , which is a graph 600 of distance versus normalized spectrum, pulsing the light at 20 MHz means the two sources cannot be resolved, but appear as a single source with a peak at 2.05 m. The robotic vision application may therefore determine how fast the laser needs to be pulsed. The laser array 310, however, can generate pulses at 500 MHz, allow sufficiently high resolution for most applications, at low cost. The MUSIC technique and other multipath resolution algorithms in general are well-known techniques and one of skill in the art would not need further description of them.

By combining a low-cost laser array, with a low cost optical element such as a DOE, and using a multipath algorithm for processing the information received by a small pixel array, a high-resolution, low-power system for 3D robotic vision can be achieved.

FIG. 7 is a block diagram of a system 700 for robotic vision that incorporates the elements described above, according to one embodiment. In this system, the laser array 710 generates laser light pulses that pass through DOE 720, generating a virtual array of points illuminating scene 730, reflecting light back to the pixel array 740. The pixel array 740 generates signals that are processed by processing element 750 programmed to use a multipath resolution algorithm, which can be a microprocessor, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or any other desired type of processing element. Using this system, high-resolution 3D robotic vision can be achieved at a low cost and with low power requirements.

Modifications are possible in the described embodiments, and other embodiments are possible, within the scope of the claims. 

What is claimed is:
 1. A system for three-dimensional robotic vision, comprising: a light source array including at least three light sources; a diffractive optical element, configured to receive light from the light sources and project a spatial modulation pattern of light on a scene; an imaging lens, configured to receive light and project the received light onto an M×N pixel array, where M and N are each less than the respective corresponding dimension of the spatial modulation pattern; and a processing element coupled to the pixel array, and configured to generate a three-dimensional map of the scene responsive to information from the pixel array.
 2. The system of claim 1, wherein each of the light sources is configured to independently transmit light at the same wavelength.
 3. The system of claim 2, wherein the wavelength is in the infrared range.
 4. The system of claim 1, wherein the light source array generates light pulse sequences, each respective light pulse sequence generated at a different time.
 5. The system of claim 4, wherein the light source array includes lasers.
 6. The system of claim 1, wherein the optical element includes a diffractive optical element configured to receive light from the light source and diffract the light into the virtual array pattern. The system of claim 1, wherein the optical element includes a phased array.
 8. The system of claim 1, wherein the optical element includes an optical microelectromechanical system (MEMS).
 9. The system of claim 1, wherein the processing element is programmed with a multipath resolution algorithm.
 10. A method for three-dimensional robotic vision, comprising: transmitting light from a light source array that includes at least three light sources; receiving, by a diffractive optical element, light from the light sources; projecting a spatial modulation pattern of light onto a scene; receiving, at an imaging lens, light reflected back from the scene; projecting the light reflected back from the scene onto an M×N pixel array, where M and N are each less than the respective corresponding dimension of the spatial modulation pattern; and generating, by a processing element, a three-dimensional map of the scene using information received from the pixel array.
 11. The method of claim 10, wherein each of the light sources is configured to independently transmit light at the same wavelength.
 12. The method of claim 11, wherein the wavelength is in the infrared range.
 13. The method of claim 10, wherein the light source array generates light pulse sequences, each of the respective light pulse sequences generated at different times.
 14. The method of claim 13, wherein the light source array includes lasers.
 15. The method of claim 10, wherein the optical element includes a phased array.
 16. The method of claim 10, wherein the optical element includes an optical microelectromechanical system (MEMS).
 17. The method of claim 10, wherein the processing element is programmed with a multipath resolution algorithm. 